An Efficient Technique for De-Noising Sentences using Monolingual Corpus and Synonym Dictionary

نویسندگان

  • Sanjay Chatterji
  • Diptesh Chatterjee
  • Sudeshna Sarkar
چکیده

We describe a method of correcting noisy output of a machine translation system. Our idea is to consider di erent phrases of a given sentence, and nd appropriate replacements of some of these from the frequently occurring similar phrases in the monolingual corpus. The frequent phrases in the monolingual corpus are indexed by a search engine. When looking for similar phrases we consider phrases containing words that are spelling variations of or are similar in meaning to the words in the input phrase. We use a framework where we can consider di erent ways of splitting a sentence into short phrases and combining them so as to get the best replacement sentence that tries to preserve the meaning meant to be conveyed by the original sentence.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Synonym Extraction Using Monolingual and Bilingual Resources

Automatically acquiring synonymous words (synonyms) from corpora is a challenging task. For this task, methods that use only one kind of resources are inadequate because of low precision or low recall. To improve the performance of synonym extraction, we propose a method to extract synonyms with multiple resources including a monolingual dictionary, a bilingual corpus, and a large monolingual c...

متن کامل

Automatic Discovery of Similar Words

We deal with the issue of automatic discovery of similar words (synonyms and near-synonyms) from different kind of sources: from large corpora of documents, from the Web, and from monolingual dictionaries. We present in detail three algorithms that extract similar words from a large corpus of documents and consider the specific case of the World Wide Web. We then describe a recent method of aut...

متن کامل

Performance Enhancement of GPS/INS Integrated Navigation System Using Wavelet Based De-noising method

Accuracy of inertial navigation system (INS) is limited by inertial sensors imperfections. Before using inertial sensors signals in the data fusion algorithm, noise removal method should be performed, in which, wavelet decomposition method is used. In this method the raw data is decomposed into high and low frequency data sets. In this study, wavelet multi-level resolution analysis (WMRA) techn...

متن کامل

Developing Monolingual English Corpus for Plagiarism Detection using Human Annotated Paraphrase Corpus

In this paper, we describe an approach to create monolingual English plagiarism detection corpus for the task of text alignment corpus construction in PAN 2015 competition. We propose two different obfuscation methods to fragment obfuscation for creating the cases of plagiarism. The first method is an artificial obfuscation which consists of variety of obfuscation strategies such as synonym sub...

متن کامل

De-Noising SPECT Images from a Typical Collimator Using Wavelet Transform

Introduction: SPECT is a diagnostic imaging technique the main disadvantage of which is the existence of Poisson noise. So far, different methods have been used by scientists to improve SPECT images. The Wavelet Transform is a new method for de-noising which is widely used for noise reduction and quality enhancement of images. The purpose of this paper is evaluation of noise reduction in SPECT ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012